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QQ ' Abstract — The quality of earthquake prediction is usuaUy characterized by a two- 



dimensional diagram n vs. r, where n is the rate of failures-to-predict and r is a charac- 
teristic of space- time alarm. Unlike the time prediction case, the quantity t is not defined 
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5_j ■ uniquely, so that the properties of the (n,T) diagram require a theoretical analysis, which 



is the main goal of the present study. This note is based on a recent paper by Molchan 
and Keilis-Borok in GJI, 173 (2008), 1012-1017. 
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1 Introduction 

The sequence of papers (Molchan 1990, 1991, 1997, 2003) considers earth- 
quake prediction as a decision making problem. The basic notions in this 
approach are the strategy, tt, and the goal function, ip. Any strategy is a 
sequence of decisions 7r(t) about an alarm of some type for a next time seg- 
ment {t,t + S), 6 <^ 1; 7i{t) is based on the data I{t) available at time t. The 
goal of prediction is to minimize ip, and the mathematical problem consists 
in describing the optimal strategy. Molchan (1997) considered the problem 
under the conditions in which target events form a random point process 
dN{t) {N{t) is the number of events in the interval (0,t)), and the aggregate 
{dN(t), I(t), nit)} is stationary. 

Dealing with the prediction of time, Molchan (1997) considered, along 
with the general case, the situation in which the optimal strategy is locally 
optimal, i.e., is optimal for any time segment. This case arises when the 
goal function has the form ip{n,T), where n,T are the standard prediction 
characteristics/errors: n is the rate of failures-to-predict and r the alarm time 
rate. The optimal strategy can then be described in much simpler terms, and 
can be expressed by the conditional rate of target events 

r{t) = P{dN{t)>0\I{t)}/dt, (1) 

the loss function cp, and the error diagram n{T). The last function can be 
defined as the lower bound of the set S = {n, r}; this set consists of the (n, r) 
characteristics of all the strategies based on I{t). The search for the optimal 
strategy on a small time segment (t, t + 6) is reduced to the classical test- 
ing of two simple hypotheses such that the errors of the two kinds (/3(a;), a) 
(Lehmann, 1959), converge to (n{T),T) as 5 | 0. In statistical applications 
the curve 1 — (3{a) is known as the ROC diagram or Relative/Receiver Oper- 
ating Characteristic (Swets, 1973); its limit in the case of the locally optimal 
strategy gives the curve 1 — n{T). 



The error diagram n(r) has proved to be so convenient a tool for the 
analysis of prediction methods that it began to be also used for the prediction 
of the space-time of target events. In that case the part of r is played by a 
weighted mean of r over space. To be specific, we divide the space G into 
nonintersecting parts {Gj} and denote by Tj the alarm time rate in Gi for 
the strategy tt. The space-time alarm is effectively measured by 



k k 



^WiTi, ^^^ = 1, Wi > 0, (2) 



where the {wi\ depend on the prediction goals, e.g., at the research stage of 
prediction one use 

Wi = area of Gj/area of G (3) 

(Tiampo et al., 2002; Shen et al., 2007; Zechar and Jordan, 2008; Shcherbakov 
et al.,2008) or 

w, = A(G,)/A(G'), (4) 

where A(G) is the rate of target events in G (Keilis-Borok and Soloviev, 2003; 
Kossobokov, 2005). When dealing with the social and economic aspects of 
prediction, it is advisable to use weights of the form 

Wi= pig) dg / p{g) dg, (5) 

JGi I JG 

where p{g) is, e.g., the density of population in G. 

The n{Tw) diagrams constructed on analogy with the error diagram are 
frequently ascribed also the properties of n{T). We now mention those prop- 
erties which, in the case of n(r^), either must be better specified or are wrong: 



(a) n(r) characterizes the hmiting prediction capabihty of the data {/(t)}. 
That means that the minimum of any loss function <f{n, r) with convex lev- 
els {(/? < c} is reached at the curve n{r); (b) ip and n(r) define the optimal 
strategy and its characteristics (n, r); (c) the diagonal D of the square [0, 1]^, 
n + r = 1, is the antipode of ^(t), because it describes the characteristics of 
all trivial strategies which are equivalent to random guess strategies. There- 
fore, the maximum distance between n(r) and D, i.e., max(l — n(r) — r)/v2, 
characterizes the prediction potential of {/(t)}; (d) l—n{T) is a ROC diagram 
arising in the testing of simple statistical hypotheses. 

Molchan and Keilis-Borok (2008) recently considered the prediction of 
the space-time of target events under conditions where the optimal strate- 
gies coincide with the locally optimal ones (the word "locally" now also refers 
to both space and time). This paper gives a correct extension of the error 
diagram, which provides the key to the understanding of the information 
contained in an n{Tw) diagram. The present note supplements the above- 
mentioned study. We refine the structure of the error diagram for space-time 
prediction and analyze the properties of two-dimensional n(r^) diagrams. 

2 The Error Diagram 

We quote the main result by Molchan and Keilis-Borok (2008) relevant to 
the prediction of space-time for target events. 

Let {Gi} be some partition of G into nonintersecting regions. The predic- 
tion of location means the indication of {Gi} where a target event will occur. 
Consequently, the model of target events in G is the stationary random vector 
point process 



dN{t) = {dNi{t),...,dNk{t)} 

whose components describe target events in {Gi}. We shall consider the 
binary yes/no prediction with the decisions 



7t(t) = {7ri(t),...,7rfc(t)}, t = nS 
of the form 

J alarm in Gi x (t, t + 6) 
7rj(t) = < 

I no alarm in Gi x (t,t + 6) 

The decision 7t(t) is based on the data I{t) that are available at time t. 

Under certain conditions, namely, the aggregate {(iN(t), I{t), 7t(t)} is 
ergodic and stationary, and moreover -P{^j=i dNi{t) > 1} = o{dt), the basic 
characteristics of the strategy vr = {7t(t)} are defined as the limit of its 
empirical means. We have in view the rate of failures-to-predict n and the 
vector 

T= (ri,...,rfc), 

which determines the alarm time rate in the {Gi}. . The quantities {n,r) 
are defined for any small 6. We shall assume that n and t have limits as 
5 I 0, for which we retain the same notation. The passage to the limit is 
not a restriction, since the data may refiect the seismic situation with a fixed 
time delay. 

The set of (n, t) characteristics for different strategies vr based on {/(t)} = 
/ is a convex subset in the {k + l)-dimensional unit cube, i.e., the error set 

S{I) = {(n,T)^ : vr based on 1} C [0,1]''^\ (6) 

(see Fig. 1). The set S contains the simplex 

k 

D = {(n, t) : n + J2 ^i^i/^ = 1, < n, r, < 1}, (7) 
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where Aj = A(Gj). The set ([7]) describes strategies that are equivalent to the 
random guess strategies. For indeed, if an alarm is declared in Gi with the 
rate Xj, then \iTi/\ will give the rate of random successes in Gi. The equality 
in (ITj) i.e., 1 — n = J2i=i '^I'^i/'^i means that the success rate is identical with 
the rate of random successes. Such strategies will be called trivial. 

The boundary of £^, viz., n{r)., which lies below the hyperplane ([7]), will be 
called the error diagram. To describe the properties of ?t-(t), we define the loss 
function if. This will be a function of the form (^^(n, t) that is nondecreasing 
in each argument and for which any level set, {ip < c}, is convex. 

The following is true. 

2.1. The minimum of (p{n, r) on S is reached on the surface n{r). The point 
of the minimum, Q, is found as the point where the suitable level {(f < c} 
is tangent to ^(t) (see Fig. 1). The coordinates of Q = {n,r) define the 
characteristics of the optimal strategy with respect to the goal function ip; 

2.2. The optimal strategy declares an alarm in Gi x [t,t + 6), 6 '^ 1 as soon 
as 



nit) = P{6N,{t) > I Iit)}/5 > ro. (8) 



and declares no alarm otherwise; 

2.3. The threshold r^j depends on ip, e.g., if 



ip = aXn + ^ hiTi (9) 



j=i 



then roi = hi/ a. In the general case one has 



dp /dp 
The result described above yields an important corollary: 



2.4. The error diagram for space-time prediction in G = {Gi} based on 
{/(t)} admits of the representation 



n{Ti,...,Tk) = '^\ini{Ti)/\, (10) 

where nj(r) is the error diagram for time prediction in Gi based on the same 
data {lit)}. 

Proof. Consider such a loss function ([9]) that the hyperplane cp = c is 
tangent to n{r) at tq = (tqi, . . . , rofc). The optimal strategy thus has the form 
([8]) with roi = hi/ a and the errors (n(To),To). However, the strategy for time 
prediction in Gi of the form (jH]) minimizes the loss function Lpi = aXiU + hr 
(Molchan, 1997). The point of the minimum has the coordinate r = toj, hence 
the other coordinate is n = nj(roj). Consequently, the collective strategy ([8]) 
minimizes 



k / ^ \ ^ 

'^(fi = aXl^Xini/x] +^biTi (11) 

and has n = ^^^^ Xini{TQi)/X as the rate of failures-to-predict. The right- 
hand side of flTTj) is identical with ip{n,r). It follows that fITOl) is true with 
n = n(To), since the strategy ([8]) also minimizes ([9]). Since tq is arbitrary, 
the corollary is proven. 

3 The reduced error diagrams 

Usually regional error diagrams nj(r) are poorly estimated, so that for 
practical purposes the result of a space-time prediction is represented by the 
two-dimensional diagram n(r^), r^j = J2i=i '^i'^i where the weights are Wj > 
and X]j=i ""^j = 1- This is obtained from the set of "errors" £w = {(r2,r^)} 
as its lower boundary. 



Relation (ITU1) can be used to analyze the properties of n(r^) diagrams. 
Later we shall use the following notation: if the set B is the image oi A = 
{ {n, t) } by the mapping 



7„ : (ra, t) -> (ra, r^), r^ = ^ Win , 

then B = Ay^; in particular, the image of t is r^,, the image of S is S^, while 
the image of D (see ([7])) is D^. 
The following is true. 

3.1. Sw is a convex subset of the square [0,1]^ that contains the diagonal 
D : n + Tui = I; 

3.2. D^u is a convex subset of £^^; D^^, degenerates to the diagonal of the unit 
square, if and only if Wi = Aj/A, i = 1, . . . ,k] 

3.3. Dyi; can be obtained as the convex hull of points of the form 



n = 1 -^^XiEi, Ty, = ^Wiei, (12) 

where {si} are all possible sequences of and 1 (see Fig. 2). 

In particular, let Wi = . . . = Wk (this will be the case for ([3]) when G is 
divided into equal parts). Then the convex minorant of the (n, r^^,) points: 



(1,0), (1- A(fc),l/A;),..., il-^\k-i+i),p/kj. 



...,(0,1) 
;ives the lower boundary of D^,, while the concave majorant of the points 



(1,0), (l-A(i),lA), 



[i-^A(,),paY...,(0,1) 



gives the upper boundary oi Dw Here, A(i) < . . . < \(k) are the {Aj} arranged 
in increasing order. 

3.4. Except for trivial cases, the image of the error diagram ^(t) is a two- 
dimensional set (see Fig. 2) with the lower boundary ?2(r^) and the upper 
boundary r7,+ (r^). In the regular case, i.e., v^i(O) = 1, i = 1, . . . ,k, one has 



n^(x) = max{Aj/A ■ ni{x/wi — ai{tj) + 6j(e)}, (13) 



where 



e = (ei,. . .,efe),ei = 0, 1, 
ai(e) = ^WjgjM, 

6,(e) = ^A,(l-5,)/A, 

and the maximum is taken over such i and (0,1) sequences e, for which the 
argument of rii in (fT3|l makes sense, i.e., is in [0, 1]. 

If {ni{T)} are piecewise smooth and nj(0) = 1, i = l,...,k, then the 
image of ^(t) degenerates to a one-dimensional curve, if and only if {/(t)} 
is trivial, i.e., 1 — ^(t) = X]i=i ^i'^i/^ and Wi = Aj/A, i = 1, . . . ,k. 
3.5. The curve n(r^) represents those strategies which are optimal for loss 
functions of the form 



k 

(p{n,r) = ^{n,T^), T^ = '^WiTi. (14) 

To be specific, if [n, r) = Q are the optimal prediction characteristics with 
respect to the goal function of the form (fT^ . then Qy, belongs to the n^Tw) 
diagram. In addition, Q^ is the point at which the curve n{Ty^) is tangent to 
the suitable level set of ip. 



3.6. The strategy that optimizes (1141) declares an alarm in Gi x [t,t + S) 
as soon as 

r^(t)M > c, (15) 

where the threshold c is independent of Gi and r^ is given by ([8]). According 
to 2.3, 

dip /dip 

In particular, ii cp = an + &X]j=i '^i^i^ then c = Xb/a. If ifj = Aj/A, then 
f llSp will have the form r.i{t)/\i > cX , where the left-hand side is known as 
the probability gain. 

3.7. For any point Q in the error diagram we can find such weights {wi} 
that Quj will lie in the reduced {n, r^) diagram, i.e., any optimal strategy can 
be represented by a suitable {n, r^) diagram . The desired weights are 

OTl 

Wi = -—{Q)/c, 

where c is a normalizing constant. The point Q determines the optimal 
prediction characteristics with respect to the loss function 



if = n 



+ C^WiTi. 



i=l 



3.8. The curve 1 — n{Tw) can be interpreted as a ROC diagram if and only 
if Wi = Aj/A, i = 1, . . . ,k. 

The ROC property of a (n, t^) diagram means that we can treat (n, Tw) 
characteristics as errors of the two kinds (/3, a) in hypothesis testing: Hi vs. 
Ho, i.e., 
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(3 = P{Ho I Hi) = n, and a = P{Hi \ Hq) = t^ (16) 

and a + /? = 1, if the prediction data {/(t)} are trivial. 

In the case Wi = Aj/A the measures P(- | Hj), j = 0,1 can be specified 
as follows. Both measures define probabilities for events u = {/(t),z/ = i}, 
where z/ is the random index of a subregion and has the distribution P(z/ = 
i) = Aj/A := Pi. The measure related to the Hq hypothesis is 



Pidu I Ho) = Po{dI)pi, u{uj) = I, (17) 

where Pq is the stationary measure on I(t) induced by the process {dN{t), I(t), 7r(t)}. 
In the Hi case 



P{duj\Hi) = ri{t)/X,-P{duj\Ho), u{uj) = i, (18) 

where rj(t) is given by (IHl). 

It is better to say that testing Hi vs. Hq for the case G = {Gi} involves 
two points: a random choice of Gi with probabilities Pi = Xi/X, i = 1, . . . ,k 
and testing Hi vs. Hq for the relevant subregion. The second point is con- 
sidered in (Molchan and Keilis-Borok, 2008). 

The following is a nontrivial corollary of the previous statement: 
3.9. For the regular case, nj(0) = 1, i = 1, . . . ,k and {wi} = {Aj/A}, one 
has 



/K-^)^" = g^^/K-^)^"' ^' = ^^/^ 



(19) 



where / is any continuous function and nx{T) is an alternative notation for 
the n(r^) diagram in the special case Wi = Xi/X, i = 1, . . . ,k. 
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If / = xlogx, the quantity 



1 / j„ \ rl 



h= / /( -'^]dT= I Inf -^Irfn, (20) 



Q y dr J Jq \ dr 

is known in time prediction as the Information score (see Kagan, 2007 and 
Harte & Vere- Jones, 2005). 

Comments. In the non- regular case, n\{0) < 1, the score (fT9l) is equal 
to oo for unbounded /(x) at x = oo, e.g., / = xlogx. Therefore the scores 
(iT9l) . (1201) are unstable. (Extensive literature on skill scores can be found 
in JoUiffe & Stephenson, 2003; see also Molchan, 1997 and Harte & Vere- 
Jones,2005). Here we mention only the area skill score which is used as a 
stable score (Zechar & Jordan, 2008). A linear transformation of this score 
looks as follows: 



A = 2 (1 - nxir) - r) dr, < A < 1. (21) 

Jo 

Due to convexity of n\{T) the area under the integrand is approximated by 
a triangle from within and by the trapezium from the outside. Therefore 



H < A<H{2-H), 
where 

H = max{l-nx{T)-T), < H < 1. 

T 

Thus A = H {3 — H) /2 is a. good estimate of A, because 

\A-A\<H{l-H)/2<l/8. (22) 
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The empirical estimate of the H skill score is unstable for a small number of 
target events. Due to fl22l) the same holds for the area skill score. 

The H score is convenient for statistical analysis because its empirical 
estimate is identical in distribution with the Kolmogorov-Smirnov statistics 
D^ (Bolshev & Smirnov, 1983), provided EA/'»(T) = A^ and {dN,} are 
independent and Poissonian. 

4 Proof 

We are going to prove the statements 3.1 - 3.9. 

Proof for 3.1, 3.2. Obviously, the projection 7^ preserves the property 
of convexity. Therefore, £w and D^, are convex at the same time as are £ and 
D. If Dw degenerates to the diagonal D : ra + r^ = 1, then the simplex D is 
given by any of the two equations: n+'Yl,i=i '^i'^i = 1 ^^'^ ^+X]j=i ■^j'^l/^ = 1- 
Hence Wi = Aj/A. 

Proof of 3.3. The simplex D is the convex hull of (n, t) points of the 
form Q{e) = (1 — X] ^i^i/^j £1, ■ ■ ■ , £k), where Ei = 0, 1. Accordingly, D^j is 
the convex hull of the Qto(e), see ( !T2|) . 

Proof of 3.4. This statement follows intuitively from dimensionality con- 
siderations: the fc-dimensional surface n{r) with fc > 1 is projected onto the 
[n, Tyj) plane, hence its image cannot be single- dimensional in the generic 
case. 

In order to prove ( IT3l) . we note that a convex function on the simplex 
^n = {J2i=i ^i'^i = "W) < Tj < 1} reaches its maximum at one of the edges, 
specifically, at a point of the form 

T = (ei, . . . , £i-i, X, £i+i, . . . , Bk), Bj = 0; 1. 

The use of (110]) gives (|T3l). 

Suppose the upper and lower boundaries of the image of n{r) are identical 
and the {^^(t)} are piecewise smooth functions. Consider all t = (ri, . . . , r^) 
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for which 



where r^ is fixed. 

Varying, e.g., ti and T2, we have after differentiation: 



Xin[in)T[ + A2n'2(r2) = 0, t[ = -w^/wi. (23) 

If Ti, r2 are points of smoothness of n^T), i = 1, 2, then repeated differentia- 
tion of (123|) will give 



Ai^i(ri)(w2/u;i)^ + A2n2(r2) = 0. 

However, n"{Ti) > 0, i = 1,2. . Hence n'-{Ti) = 0, i.e., nj(r) are locally linear 
at all points of smoothness. Since ni{T) are piecewise smooth, it follows that 
for any discontinuous point Ti of ni(-) one can find a point T2 where ^2(") will 
be smooth. Consequently, when rii is discontinuous at r, one should replace 
n'^(ri) with n[{Ti + 0) and ^i(ti — 0) in equation (125|1 . But then we have 
from fl23l) that n'^(r) is continuous at Ti, hence all functions nj(r) are linear. 
Taking the boundary conditions ni{0) = 1 and nj(l) = into account, we 
have nj(r) = 1 — r. However, in that case one has S = D, and, in virtue of 
3.2, Wi = Xi/X. 

Proof of 3.5. Let Qui be the point where the convex set {ip < c} is 
tangent to the convex curve n(r^). The function ip reaches its minimum at 
the point Q^ on f^^,, because the sets {ip < c} are increasing with increasing 
c. Since Q^ G Sw, the preimage Q = (n, t) G £. At this point f{Q) = i'iQw) 
reaches its minimum on S, hence Q belongs to the surface n(r). 

Proof of 3.6. follows from 2.3. 
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Proof of 3.7. Let Q = (no, tqi, ..., Tok) belong to n{r). liwi = -|^(Q)/c, 
then the equation 



+ c^Wiri = no + c^Wirio (24) 



k k 

n ■ 

defines the tangent plane to n(T). Since n{r) is convex and decreasing, it 
follows that Wi > and S lie on the same side of the plane (l24l) . Conse- 
quently, a strategy having the characteristics Q = (no, tqi, . . . , tqu) optimizes 
the losses y? = n + c^^^^ WiTj. Using 3.5, we complete the proof. 
Proof of 3.8. By ^ and ^ one has 



k k 

P = n = ^\i/\-ni{Ti), a = T^ = ^WiTi. 

i=l i=l 

In the trivial case of /(t), one has ?7,j(r) = 1 — r and a + (3 = 1. Hence 

k k 

/3 = 1 - ^ Xi/X ■ Ti, a = ^ w^jTi = 1-/5, 

j=l i=l 

i.e., Wj = Xi/X, i = 1, . . . ,k. 

Suppose that {wi} = {Xi/X}. The likelihood ratio of measures flT7|) and 
(fTSjl at the point cj = (J(t), j) is 

L(cj) = P(rfcj I Hi)/P{diu I i/o) = ^j(i)/Ai. 

Accepting the hypothesis Hi as soon as L(a;) > c and Hq otherwise, one 
has 



/ P{diJ \Ho) = ^ El(r,/x,>c) ■ Aj/A = ^ TjXj/X = r^ 

"^^>C ,=1 7=1 



P = L{w)P{duj \Ho) = J2 Evj/Xj ■ l(r,/A,<c) ■ Xj/X = ^ nj(rj)Aj/A = n 
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Here we have used 2.1. and 2.2. 

Proof of 3.9. Let us consider a testing problem: Hi vs. Hq with the 
errors P = Pi{L < c) and a = Po{L > c) where L^u) = dPi/dPo is the 
hkehhood ratio. Obviously 

Eof{L) : = J f{L{uj))dP,{uj) = J f{c)dFL{c), 

where F^ is the distribution of L with respect to the measure Pq. But 
dp = cdF[c) and da = —dF{c). Therefore 



Applying this relation to the case (fT6!) . (TT7|) . (TTSl) . one has 



Pi 



Here Lj is the likelihood ratio dPi/dP^ for Gj. 

5 Conclusion and Discussion 

1. Results. In the case of time prediction, the error set S is organized as 
follows: all trivial strategies concentrate on the diagonal n + t = 1 of the 
square [0, 1]^, while the optimal strategies are on the lower boundary of S, 
viz. n{T). In the case of time-space prediction, the two-dimensional images 
of S, i.e., Sw, are organized differently: the diagonal n + t^ = I does not 
include all trivial strategies, and the (ra, r^^,) diagram does not include all 
optimal strategies (see Fig. 2). 
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Nevertheless, n(r^„) is a convenient tool to visualize such optimal strate- 
gies as are suitable for a trade-off between n and r^. However, if {wi\ ^ 
{Aj/A}, then the distance of nij^ from the diagonal n + r^ = 1 does not 
tell us anything about the prediction potential of the relevant strategies. To 
learn something about this potential, we need the image of trivial strategies 
D on the (n, r^) plane. The lower boundary of Dyj may be very close to the 
ideal strategy with the errors (0, 0). 

Let us consider an example. The relative intensity (RI) method (Tiampo 
et al., 2002) predicts the target event in that location where the historical 
seismicity rate, f{g), is the highest, f > c. The RI is a typical example 
of a trivial strategy occasionally employed as an alternative to meaningful 
prediction techniques (see, e.g., Marzocchi et al., 2003). By the RI method, 
Tj = 1, if / > c in the i-th bin and Xj = otherwise. If {wi = Aj/A}, then 



l-n= f{g) dg = r^, 

Jf>c 

i.e., ra + Tu, = 1 for any level c. If Wi = \Gi\/\G\, where |G| is the area of 
G, then the curve n^r^) can be obtained by using (fT2|) (see also Zechar and 
Jordan, 2008). The curve passes close to (0,0), if most of the target events 
occur in a relatively small area, say, Ai/A is close to 1 and wi is close to 0. 

One gets a unique set of weights by choosing Wi = Aj/A (see 3.2, 3.8). 
It is only in this particular case that all trivial strategies are projected onto 
the diagonal D : n + r^ = 1, and 1 — n{T^) is a ROC curve. Besides, the 
projection on the (n, r^) plane preserves the relative distance between any 
strategy and the set of trivial strategies . To be more specific, the following 
relations are true: 



n — y TiXi X = —-— = ^ = 1 — n — Tyj 25 
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(Molchan and Keilis-Borok, 2008). Here, p is the Euclidean distance, e.g., 
p{0, D) is the distance from Q = {n, r) to the hyperplane D = {n + 
^ r^Aj/A = 1}, and O = (0,0. . .0) corresponds to the ideal strategy. The 
right-hand side of (1251) is known in the contingency table analysis as the 
HK skill score (Hanssen-Kuiper, 1965). Consequently, when {wi = Aj/A}, 
the quantity H = max(l — n{Tu,) — r^) gives the greatest relative distance 
between the optimal and the trivial strategies. 

The choice of {wi} at the research stage instead of {Aj/A} is justified by 
difficulties in the way of estimating the {Aj}. This justification is illusory, 
however. One must know the lower boundary of D^ in order to answer the 
question of how nontrivial the n(r^) diagram is. But this again requires 
knowledge of the {Aj} (see (IT^ and Fig. 2). 

2. The relation to the SDT. In recent years the studies in earthquake 
prediction are actively using the Signal Detection Theory (SDT) developed in 
the late 1980s in the atmospheric sciences (see, e.g., Jalliffe and Stephenson, 
2003 and the references therein). The main object of this theory is a warning 
system, which characterizes the state of hazard by a scalar quantity ^. The 
system is tested by making K ^ 1 trials in which the i-th event {^ > 
u} is interpreted as an alarm, Xi = 1, otherwise Xi = 0. The results are 
compared with observations x = Yes or No with respect to a target event. 
Any dependence between the members of the sequence {(xj, Xi)} is ignored a 
priori. It is required only that the rate of target events (x = Yes) should be 
< s < 1. This condition is essential for getting an acceptable estimate for 
the simultaneous distribution of {xi,Xi). Note that s = in our approach. 

Two problems are formulated: assessing the prediction performance and 
choosing the threshold u in a rational manner. The first problem is attacked 
using the 2x2 contingency table of forecasts and the second by using the ROC 
diagram related to the hypothesis testing about the conditional distribution 
of ^ given x = Yes and given x = No. 

In our terminology this situation is one with discrete "time" where the 



data / in a trial are given by ^. Therefore, the SDT is equivalent to the analy- 
sis of the time prediction of earthquakes using a specified precursor/algorithm, 
even though the prediction of large earthquakes involves s ^ 1. The ROC/n(r) 
diagram then quantifies the predictive potential of a precursor, ^ in this case. 
All meaningful strategies are functions of ^, hence reduce to choosing the level 
u. 

In the case of any data, I{t), n{T) characterizes the prediction perfor- 
mance of {/(t)} and gives the lower bound to ROC curves for any algorithm 
based on {/(t)}. The studies of Molchan (1990, 1997) answer the question 
of how the quantity ^ should be constructed for the original prediction data 
and why the relation to hypothesis testing arises at all. 

The gist of the matter lies in the fact that the 2x2 contingency table 
is defined by three parameters (n, r, s) , and the program of prediction opti- 
mization is formulated, explicitly or implicitly, in terms of that table. As a 
result, we have to deal with local optimal strategies only. When real time is 
incorporated in the SDT framework, there arise additional parameters that 
are important for seismological practice, e.g., the rate of connected alarms 
(alarm clusters) u. The optimization of the loss function (/? = an + 6r + cz/ at 
once gets us beyond the SDT framework and its tools. The strategies that 
optimize ip are not locally optimal, and can be found from Bellman-type 
equations (Molchan and Kagan, 1992; Molchan, 1997). 

The use of the SDT approach in space-time prediction imposes a rather 
unrealistic limitation: the spatial rate of target events must be homogeneous. 
Otherwise, the ROC diagram looses its meaning and becomes a (n, r^) dia- 
gram (see Fig. 2). 
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Figure captions 

Fig. 1. Space-time prediction cliaracteristics: n vs. t = (ti, . . . ,Tk) (tlie 
horizontal axis is multidimensional) 

Notation: S{I) represents all strategies based on the data /, the hyperplane 
D represents the trivial strategies (random guesses), and the surface ^(t) 
the optimal strategies (the error diagram). The level sets of the loss function 
(^(n, t) are shown by dashed lines, the characteristic of the optimal prediction 
is a tangent point Q between n{r) and the suitable level set of ip. 

Fig. 2. The reduced error diagram: n vs. r^ = X]i=i '^i'^i 
Notation: S^ contoured by bold lines represents all strategies S in the (n, r^) 
coordinates; the stippled zone D^ represents the trivial strategies; the broken 
line within D^ illustrates the method used to construct D^, see 3.3; the 
filled zone is the image of the n{r) diagram; isolines of the loss function 
ip = ip{n, Tyu) are shown by dashed lines; ip yields the optimal characteristics 



23 



n 








T — (Ti,...TJ 



24 




25 



